This is an R Markdown document that details one workflow to fire an EC2 instance, load RStudio on that instance, and initiate a docker image that has all spatial dependencies needed for GIS work on AWS. This workflow is truly Earth Lab specific, but can be adapted for any lab group. Here there will be a mix between the AWS user interface and terminal command prompts. Let’s get started.

Log in

Well I guess, first thing is first. Log onto your account and click on the EC2 icon under Compute.

Step 1: Choosing an AMI

The first key step is to properly launch the correct Amazon Machine Image. Within the quick start tab there are many AMI’s that sit on various operating systems. Depending upon what you will be running any of those may be the appropriate choice. For this tutorial we will be using a publicly available docker image produce by the Earth Lab which sits on an Ubuntu OS. For anyone who has the Earth Lab credentials, this docker image will be automatically be visible. Select this to move on (highlighted in red).

some text

Step 2: Choosing an Instance type

This step may need some more thought in how you choose the right EC2 instance type, which is all dependent upon the processing you are hoping to accomplish on AWS. After you have decided how much processing power you need, verify that it is reasonably price. There is quite a bit of literature out there to help explain all of the different types of instances, and how you should choose given the task you want to preform.

I think, generally, the two main variable you are going to be interested in are nodes and memory. If you want to do parallelizing of you code, the number of nodes will be critical to efficiently run the program. If you have large data-sets you may want to opt for more memory. And, of course, there may be combinations of those tasks you wish to accomplish/completely different tasks given your desires. Make an informed choice is really the main point here.

some text

Step 3: Configuring the Instance

This is a step that you can easily bypass. Leave all as default for now.

some text

Step 4: Configuring the Storage

Leave all as default for now. 10GB should be fine for whatever you are doing, but depending upon your tasks and data-sets involved you may want to increase the storage. Here is more information on how to chose the type and amount of storage, if you want to increase the capacity.

some text

Step 5: Add project tags

Tags are very important and tremendously helpful, especially when there are a lot of other groups using the same AWS account or you have multiple instances running for different projects. A short and sweet description of what the instance will be tailored too is all that is necessary during this step. It will instill a best practice and save you some gray hairs down the line.

some text

Step 6: Adding Rules

This is probably the most critical part in this whole workflow. By properly defining the HTTP port, our spatial docker image can interact with the EC2 instance. This is done by creating an “Add Rule” > “Custom TCP I”. Leave the “SSH” type as default. Now you want to specify that the port for the “Custom TCP I” is 8787.

some text

Step 7: Creating an SSH key pair

Upon your first EC2 instance launch, you will create a key pair file (.pem), which is effectively the passcode to launch EC2 instances remotely. Once this is generated, put it somewhere on you computer that will be out of the way and safe. I recommend something like MyDocs>AWS>file-name.pem. I would also suggest naming it something that you will recognize and be specific to you (name, initials, etc.). After this has been accomplished, every other time you create an instance all you will need to do is select the pem file from the drop down bar. Easy!

some text

Step 8: Successfully launched an EC2 instance

Congratulations, you have successfully launched your first EC2 instance on AWS. A screen will appear with the name of the instance and all pertinent information associated with that particular instance. In the screen shot below, make note of the public DNS associated with this particular EC2 instance, we will be needing it in the next steps

ec2-52-32-104-200.us-west-2.compute.amazonaws.com

some text some text

Install and prep for docker image

Now, if you do not already have a Docker Hub account, head over there and create one. Now download Docker to your machine. Sign into the Docker Hub application on your machine. If you have a Mac then it will be found in the top panel (see below)

Terminal time

Install and load the Docker container: r-spatial-aws

docker pull earthlab/r-spatial-aws docker run -i -t earthlab/r-spatial-aws /bin/bash

Now we need to access our .pem file, ssh into our EC2 instance, and launch our Docker container containing all of the spatial libraries needed for analysis.

Now in terminal type:

chmod 600 /point/to/path/*.pem

Then:

ssh -i /point/to/path/*.pem unbuntu@ec2-52-32-104-200.us-west-2.compute.amazonaws.com

You will need to change the ec2-52-32-104-200.us-west-2.compute.amazonaws.com based on your particular instance.

Now let’s launch the Docker container on top of our EC2 instance by:

docker run -d -p 8787:8787 earthlab/r-spatial-aws

Final Step

Now that we have successfully created our EC2 instance which now sits on our spatial Docker container - let’s kick it off!

  1. Open your favorite web browser

  2. Copy your Public DNS and append :8787 to it. It should look like:

    ec2-52-32-104-200.us-west-2.compute.amazonaws.com:8787

  3. You will now see a log in prompt to enter RStudio

    username: rstudio password: rstudio

Voila!* You are now running version of RStudio wrapped in a Docker container all on AWS!

Some additional documents:

Matt Strimas-Mackey blog 1 and 2